The “done” and “coming soon” statuses are just to help me keep track of what I have put into my main portfolio rmd.
Github install
GitBash install
RStudio install
#created MICB425_portfolio directory on my computer
#created new repository 'MICB_portfolio' on my Github account
git init
git add .
git commit -m "comment text" #comment was 'First commit'
git remode add origin [repository url] #URL was taken from repository page on Github
git remote -v #just to check that URL was correct
git push -u origin master
The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.
http://phdcomics.com/ Comic posted 1-17-2018
The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)
hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown
Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).
Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:
1231521+12341556280987
## [1] 1.234156e+13
Or maybe, after you’ve added those numbers, you feel like it’s about time for a table! I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.
library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
| speed | dist | |
|---|---|---|
| Min. : 4.0 | Min. : 2.00 | |
| 1st Qu.:12.0 | 1st Qu.: 26.00 | |
| Median :15.0 | Median : 36.00 | |
| Mean :15.4 | Mean : 42.98 | |
| 3rd Qu.:19.0 | 3rd Qu.: 56.00 | |
| Max. :25.0 | Max. :120.00 |
And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!
Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.
*** ###Evidence worksheet 03 Rockstrom 2009
atmospheric aerosol loading
Systems currentl past limit values:
| System | Variable | Limit Value | Current Value |
|---|---|---|---|
| Climate Change | CO2 | 350ppm | 418ppm (387ppm at time of article) |
| Climate Change | Radiative Forcing | 1Wm-2 | 1.5W-2 |
| Biodiversity Loss | Species Loss Rate | 10x Background Rate | 100-1000x Background Rate |
| Nitrogen Cycle | N2 converted to NO3 or NH4 | 35x106 ton/yr | 120x106 ton/yr |
I thought it was quite straight forward. It was pretty haunting to see their quoted atmospheric CO2 concentration and think “Did they get that wrong?” just to realize this was written 9 years ago and we have already pushed another 30ppm past the limit propposed here.
Aquatic - The majority of prokaryotic life is found in the open ocean. They have a short turnover time and therefore a high cellular productivity, which means that mutations and other rare genetic events are most likely to occur here than other habitats.
Subsurface - Major habitat for prokaryotes, with most of the subsurface biomass supported by organic matter deposited from the surface.
Soil - Major reservoir of organic carbon; prokaryotes are essential in soil decomposition
| Environment | Aquatic | Subsurface | Soil |
|---|---|---|---|
| Total abundance | \(1.18*10^{29}\) | \(3.8 x 10^{30}\) | \(2.556*10^{29}\) |
Density: \(5*10^5\) cells/mL
Cyanobacteria: \((4*10^4 cells/ml)/(5*10^5 cells) * 100 = 8%\)
Cyanobacterium such as Prochlorococcus produce their own energy from sunlight via photosynthesis, which in the process produces oxygen while fixing carbon. Despite only being 8% of the prokaryotic cell abundance in the upper 200m, they are responsible for approximately 50% of the oxygen in the atmosphere and contribute greatly to carbon cycling as demonstrated by their quick turnover time, resulting in \(8.2 * 10^{29}\) cells/year.
Autotrophs - bacteria that produce their own food, primarily using energy from the sun. In this paper only marine autotrophs are considered, and the overwelming majority of them is said to be Prochlorococcus.
Heterotrophs - use organic carbon as an energy source and carbon source. They are the overwhelming majority of cells on Earth.
Lithotrophs - prokaryotes that gain energy from something other than organic carbon or sunlight. They are said to be found in small amounts in the subsurface and that organic carbom still sustains most life in the subsurface.
Cells/year = Population Size * (turnover/year)
\(2.9*10^{27}cells * (365(days/year)/1.5days) = 7.1*10^{29}cells/year\)
Tectonic movement along with photochemical reactions in the atmosphere allow for mixing and partitioning of chemical substrates on Earth.
Biogeochemical(biotic): (Redox)
Although there is enormous genetic diversity in nature, there remains a relatively stable set of core genes coding for the major redox reactions essential for life and biogeochemical cycles. Thus, microbial diversity does not necessarily entail diversity in proteins involved in metabolism.
It is hypothesized that there is limitless evolutionary diversity in nature. The rate of discovery of unique protein families has been proportional to the sampling effort, with the number of new protein families increasing approximately linearly with the number of new genomes sequenced.
“Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.” Do you agree or disagree with this statement? Answer the question using specific reference to your reading, discussions and content from evidence worksheets and problem sets.”
Microbial actions and human life are indisputably intertwined. Humans, however, are not wholly dependent the ecological catalysis of microbial life for the survival and proliferation of our species. Prokaryotic life catalyzes several important biogeochemical cycles globally, providing immense shifts in the redox states of nearly all biologically important compounds via their metabolic processes. Currently, both human activities and microbial metabolisms contribute to nutrient cycling on Earth, with microbes bearing most of the weight, but this balance is changing at an accelerating rate.
Organisms with consciousnesses have the ability to apply their efforts directionally towards a specific change. As humans, we can make plans while taking future conditions into account, we can cooperate towards a goal, and we can place a value on the success of individuals and species beyond the self. Microbes are bound by the forces of natural selection, and changes must be beneficial to an individual cell in order to be passed on to the next generation. Human consciousness gives humanity the capacity to make faster and more drastic changes to the biogeochemical landscape than microbial life.
Over the century since the industrial revolution, human industry, technology, and understanding of the universe has increased exponentially. Compared to the temporal scale that geological processes and shifts in net global microbial function occur, all of human progress is only a tiny blip on the tail end of history. Although human impact on biogeochemical cycles is minimal compared to that of prokaryotes in the present day, the pattern of exponential increase in humanity’s capacity to alter the environment, combined with recent emerging biotechnologies, makes for a compelling argument that microbes will become replaceable in the not so distant future.
The potential for human processes to excel beyond microbial processes is facilitated by consciousness. This emergent property gives humanity three main abilities that mitigate the normal evolutionary selection processes: the ability to cooperate towards a goal that may not immediately be beneficial to all contributors; the ability to take future impacts of actions into account when making current plans; and to place an innate value on the lives of humans and other species, both through compassion and through economic market forces. Together, this means that people can establish how a system functions, allocate resources towards a change in said system, and enact a plan not only to change the system in a specific direction, but to change it in what is determined to be the best direction. Conversely, changes to net prokaryotic population function, another emergent property, are determined solely by whether a small change is beneficial to the ability of an individual to reproduce. Natural selection places limits on the capacity of microbes to enact change on biogeochemical cycles compared to humans.
Both the maximum potential for change to biogeochemical cycles and the rate at which this potential can increase for microbes have boundary limits. The rate that the global, net prokaryotic metabolism can change is limited by the rate at which cells divide and the rate at which the global microbial genetic pool can be altered. Although the estimated number of prokaryotic cells on Earth is astronomically large, on the order of 1030, nearly all of them live in the terrestrial subsurface and have an average turnover rate on the scale of centuries (Whitman et al., 1998). Within geological time scales, this ‘silent majority’ is extremely active and relevant, but on the time scale of modern human environmental intervention, the division rate of these hidden cells makes their tremendous abundance much less consequential.
Cell division is also limited by energy availability. The primary input of energy to global biological systems is photosynthetic carbon fixation by higher plants and photosynthetic bacteria. The rate at which sunlight is transformed to an ecologically available energy source by a given photosynthetic population and the rate at which this energy can be disseminated to other organisms in the deep ocean and terrestrial sediment both place boundaries on the maximum global rate of microbial production. Global metabolic catalysis is dependent on the energy supplied to biological processes, and microbes have physical limitations to both maximum energy production and energy transfer between cells.
Beyond cellular division, genetic variation is necessary for changes to prokaryotic metabolic function, which in turn determines the ability for the global prokaryotic population to alter biogeochemical cycles. Horizontal gene transfer and the extremely high abundance of cells on Earth make useful mutations extremely common, even on small temporal scales. It is estimated that four simultaneous mutations occur in a cell every half hour, in the surface ocean alone (Whitman et al., 1998), however, this does not mean that the pool of available genes changes quickly. For a mutation to be heritable, it cannot be lethal. This presents a hard limit on the extent of change that can happen to genetic sequences in a single generation. Proteins vital to the survival of an individual cell, such as the metabolic enzymes relevant to many biogeochemical cycles, cannot be completely changed by mutation to a single cell in a single generation. Instead, functional diversity is the cumulative change to sequences over long time periods. Natural selection in a varied pool of random mutations is a system that strongly favours improvement to existing structures over the introduction of truly novel ones. The core set of proteins that carry out metabolic redox reactions which drive global biogeochemical cycles were developed extremely early in the history of life on Earth, and are still highly conserved (Falkowski et al. 2008). Microbial populations have boundary conditions that limit metabolic rate and functional change, imposed by both the processes of mutation and selection, and the rate of energy acquisition and distribution within a biological system. In the context of human activity, these boundaries have different limits, and may be able to be completely mitigated in the near future.
Recent history provides evidence of the potential for human activity to be the dominant controller of global nutrient cycles. Humanity has been raising the limits of energy acquisition and distribution since the first use of controlled fire, nearly 600 000 years ago (Berna et al., 2012). The ability to obtain and use energy more efficiently has increased along with the development of human civilizations. The first agriculture marked the beginning of a steady march toward increasingly efficient conversion of sunlight to available food sources, and made the first large-scale energy distribution network necessary: the transport and trade of food. The beginning of the industrial revolution marked the shift away from human bodies as the primary means of energy conversion from chemical to other forms. Vast canal systems for coal distribution made up the second, higher throughput energy distribution system. Finally, the discovery of electricity, along with the wide scale adoption of oil as a fuel source, ushered in the third generation of power production and distribution. In modern times, vast amounts of energy are produced by a ‘metabolism’ of human activity. Electrical and chemical energy are distributed along global networks of wires, pipes, and roads.
Energy availability becomes less of a limitation to maximum human impact on biogeochemical cycles every year. At the present date, human industry already rivals the magnitude of influence on nitrogen and carbon cycling by microbial metabolisms. Atmospheric carbon dioxide measurements show that the interannual increase in carbon dioxide due to anthropogenic combustion of fossil fuels, indicating that humanity already has to power to be the deciding factor in carbon cycling but does has not yet implemented directional control (NOAA, 2018). Likewise, the Haber process has allowed human activity to synthetically reduce massive amounts of nitrogen gas to ammonium for use in agriculture. At the turn of the millenium, humans produced about half of all nitrogen fixed annually, and this value has been increasing exponentially since the 1940’s (Rockstrom, 2009;Vitousek et al., 1997). Besides the conversion of organic matter to carbon dioxide, and the conversion of nitrogen gas to ammonium, human industry has the capacity to upset nearly any step in global biogeochemical cycles, should the current microbial processes become insufficient.
Man-made fixed nitrogen and carbon dioxide have both increased exponentially as global energy production has risen. However, the limitation of energy availability will not last much longer. A crude exponential fit of global energy production from 1820 to 2010 extrapolated to the year 4000 shows that humanity will consume the energy of our entire sun in just another 1800 years if production continues on the trend set since the industrial revolution (Fig. 1). In all likelihood, an element beyond energy availability, such as the maximum carrying capacity for human life on Earth, will set a new limit on human progress long before the need for a Dyson sphere is reached, but the key factor is that human energy production is virtually endless compared to the limited photosynthetic rate providing energy to microbial metabolism.
If the energy available to humans far outpaces that of microbes, the other factor at play is diversity of function and its rate of function. The human analog for the global microbial gene pool is the sum of human knowledge and available computational power. Computational power has increased exponentially since the first integrated circuits in the 1960’s. Transistor density has followed Moore’s Law by doubling every year, although this is expected to stop in the near future as transistor sizes become small enough for quantum effects to cause problems with keeping microcircuits closed (Chien and Karamcheti, 2013). Gallium and other alternatives to silicon are being explored to put off the end to Moore’s Law, but these are all just stopgaps and eventually transistor density must plateau due to physical limitations. However, computational power can still increase exponentially without an increase in transistor density, as long as there is enough available energy to fabricate and run more computer chips. Energy production is not likely to reach a maximum limit before Moore’s Law is terminated, meaning future computational power will be tied to energy production, a value that has been increasing exponentially for two centuries.
The human analogue to microbial genetic diversity is the diversity of technologies available. The sum of human knowledge has increased exponentially as energy availability and societal changes have allowed for greater resource allocation to research. Specifically, the science of microbiology is only in its infancy. Microbes were first observed a mere 400 years ago and medical microbiology exploded just 100 years ago. DNA was first imaged 70 years ago and molecular techniques in biology have become increasingly complex since then. Humanity’s collective understanding of how life functions, including how microbes impact geochemical cycles, has increased at an accelerating rate throughout all of human history. It is reasonable to expect that given increased energy and resources going forward in time, knowledge of biotic chemistry will continue to increase exponentially. All of human history is a raindrop in the ocean of time, where significant changes to genetic diversity and geologic equilibria have occurred. Furthermore, the period of time since people have begun to tease apart the intricacies of life on a microscopic scale is only a molecule of water in that raindrop. Right now, new human technologies can develop considerably faster than new microbial functions. Humanity is on the tipping point of making prokaryotes obsolete.
With the assumption that anthropologic energy production and knowledge of the universe will continue to increase exponentially into the next millennium, humanity is poised to make the metabolic catalysis of biogeochemical cycles by microbes unnecessary. Consciousness has allowed higher boundaries on rates of change to the environment for humans than for microbes, As evidenced by the current upsets to global carbon and nitrogen cycles. Potential for humans to alter biogeochemical processes will increase much faster than biological or geological systems will be able to adapt. The next millennium will mark the point where sufficient energy and technology will be available to humans to make prokaryotic processes antiquated and irrelevant.
Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583. PMC33863
Rockstrom. 2009. A safe operating space for humanity. Nature. 461(24). DOI:10.1038/461472a
Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science. 320(1034). DOI:10.1126/science.1153213
Berna et al. 2012. Microstratigraphic evidence of in situ fire in the Acheulean strata of Wonderwerk Cave, Norther Cape province, South Africa. PNAS. 109(20)E1215-E1220. DOI:10.1073/pnas.1117620109
NOAA. 2018. Recent Monthly Mean CO2 at Mauna Loa.
Vitousek et al. 1997. Human Domination of Earth’s Ecosystems. Science. 277(5325): 494-499. DOI: 10.1126/science.277.5325.494
Vlachogianni and Valavanidis. 2013. Energy and Environmental Impact on the Biosphere Energy Flow, Storage and Conversion in Human Civilization. Science and Education Publishing.
Chien and Karamcheti. 2013. Moore’s Law: The First Ending and a New Beginning. Computer. 46(12):48-53. DOI: 10.1109/MC.2013.431
In 2002, values up to 500000 were discussed.in 2016 values of millions to trillions were presented. Only 20% of prokaryotes are represented by cultured species.
Thousands available just from EBI and thousands more from hundreds of other sources. The main biomes sequenced are soil, human digestive tract, marine, and freshwater, but metagenomics projects exist for almost every conceivable niche environment.
Martinez A et al. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. PNAS. 104(13):5590-95. DOI: 10.1073/pnas.0611470104
Evaluate the concept of microbial species based on environmental surveys and cultivation studies.
Explain the relationship between microdiversity, genomic diversity and metabolic potential
Comment on the forces mediating divergence and cohesion in natural microbial communities
Candy community counting
In order for a sample to be diverse, individuals must be divided into groups based on some sort of differentiation. Species definition is how these groups are chosen. Bothe the Simpson Index and Chao1 richness values change if the number of species changes. This means that if your species definition is more granular and has a higher taxonomic resolution, a sample will appear more diverse than the same sample analysed using a different definition of what a species is.
We did not draw any differences based on colour, deciding that different colours marked different strains within the same species. If we dividied species by colour as well as brand we would have many more species in the same community. Alternatively, we could have grouped all gummy candies together and all chocolate, or all round candies together to end up with less total species. Certain very specific changes could have also been used to raise our species count such as separating dark chocolate M&M’s from milk chocolate into two species.
Sanger and Illumina sequencing both use PCR befor sequencing, whether for raising the template concentration or cluster generation. PCR has can introduce changes to the sequence when DNA taq polymerase mismatches a base, but these do not make it through to the final sequence because base calling in both systems is based off an integrated value of many molecules at once. However, if an incorrect sequence was generated in an early PCR cycle, it could continue to be replicated and evetually compose a sizeable portion of the final sequence pool being observed. This could cause incorrect sequences. Another problem that PCR introduces is chimera generation. Sequences can recombine part way through replication and create new, hybrid sequences containinng part of two other original sequences. This new sequence will be replicated and eventually sequenced. Once sequences are obtained and it is time to try to bin into species, PCR error will blur the edges of similar sequences, even if they should all be contained in one species. More importantly, chimeric sequeces will appear as completelydifferent organisms and drastically raise the amount of species in a sample and therefore the diversity of the sample. Third generation sequencing based on single molecules, such as Oxford Nanopore, eliminate the issues introduced by PCR because it is no longer a necessary part of the sequencing process.
“Discuss the challenges involved in defining a microbial species and how HGT complicates matters, especially in the context of the evolution and phylogenetic distribution of microbial metabolic pathways. Can you comment on how HGT influences the maintenance of global biogeochemical cycles through time? Finally, do you think it is necessary to have a clear definition of a microbial species? Why or why not?”
The definition of “species” used in macroscopic biology is useless when applied to prokaryotes, as the definition relies on sexual reproduction producing viable offspring to determine whether two organisms belong to the same species. Currently, there is no clear definition of what a species is for prokaryotic organisms, due to a plethora of compounding reasons. Morphological differences present in prokaryotes are not as diverse as the global pool of microbial organisms, and the marker genes used in genetic definitions do not always retain their sequence across a whole species. On the other hand, there may not be enough differentiation in marker sequence even at the genus level to usefully separate organisms into appropriate species. Horizontal gene transfer (HGT) complicates the issue even further, as copies of DNA segments can spread across species and even genera, throwing a wrench into the workings of any species definition based on whole genome similarity. HGT frustrates the species definition aspect of microbiology, but it plays an important role in the maintenance of biogeochemical cycles. Environmental conditions may change such that a particular species will die out, but a horizontally transferred gene and its environmental function can persist in another species that is perhaps better suited to this new environment. Perhaps it is not even necessary to have a clear definition of prokaryotic species, outside of medical microbiology, because specific functions are what are important to humans. The net function of a microbial community matters more than which specific constituents make up the community. Tracking the evolution and diversity of genes over time may be more useful for environmental applications than species as it allows for a direct observation of function.
Traditionally, multicellular organisms were classified into the same species if their offspring were fertile. For microbes, who reproduce asexually by dividing one organism into two new ones, this definition completely misses the mark. Any attempt to classify prokaryotes by morphological features is also futile, as the range of diagnostic physical characteristics in unicellular organisms is just too small. Instead, modern microbiology relies on genetic similarities to classify organisms into species. Unfortunately, this genetic system is also fraught with problems. Different methods of determining genetic similarity are used by different researchers, including hybridization temperature, average nucleotide similarity across a genome, sequence similarity in a gene conserved across taxa, and sequence similarity in just one region of a gene (Kim et al., 2014). Since all of these definitions rely on some threshold value of similarity to determine whether a given sequence falls into one or another species, these arbitrarily-chosen threshold values provide another point of variation in how species are defined by different researchers. There is a fundamental tradeoff between taxonomic resolution and inclusion of organisms that would not normally be considered part of a certain species, when these thresholds are changed. Sequencing error and chimera generation can provide a source of false species to all genetic species definitions, limiting their utility as a system for determining diversity (Kunin et al., 2010).
A final source of trouble when determining a single, clean definition of a microbial species is HGT. When segments of DNA are exchanged between cells, relatively large chunks of sequence can be integrated into a new cell’s genome. Most researchers would say that an organism should still belong to the same species if all that has changed is the acquisition of a handful of additional genes among thousands of others. However, the functional capabilities of an organism can change dramatically, depending on which genes are incorporated (Martinez et al., 2007). Even commonly used taxonomic marker genes such as the 16S rRNA gene may be transferred horizontally (Wooley et al., 2010). One possibility of a species definition that could account for HGT would be using a metric of how many genes are shared between individuals. However, this option does not solve the issue of defining separation thresholds, but rather pushes it to the gene level instead of the organism level.
Genes can persist within an environment on much longer timescales than prokaryotic species. When environmental conditions change, selective forces acting on microbial communities will also change and some species will die off or be reduced to extremely low abundance. The process of HGT allows organisms that are suited to life under the new environmental conditions to inherit biogeochemically important genes from dying species and expand to fill the newly absented niche. Even though species have been in a constant cycle of creation through mutation and removal by selection for over 4 billion years, the same core set of redox metabolic genes has been retained for nearly the full length of this period (Falkowski et al., 2008). HGT allows genes to evolve and function as autonomous entities, rather than components of an organism whose success is constrained by the overall organism’s fitness.
Medical microbiology has a longer history of species definition than environmental microbiology, and most established species in this field are defined via assayable chemical functions or physical characteristics. Function is still the attribute that is most important and pathogenicity islands can be transferred horizontally to new species that may not be identified by tests designed for the species originally carrying the genes. Exact definitions of whether an organism falls into one species or another are not medically important, only whether or not the organism causes illness. Historically, diversity has not been an important factor in medical microbiology, although the gut microbiome now bridges the fields of environmental and medical microbiology. A rock-solid species definition is not important if diversity does not matter. Because diagnosis is often time sensitive, genetic species definitions will not be useful in medical applications until sequencing and sequence data analysis can give results in the same amount of time as test strips designed to identify a certain species. In environmental microbiology, results are not quite as time sensitive and diversity of function and how it fluctuates spatiotemporally are important factors, so some unit of unique life needs to be defined.
Perhaps a clear species definition is not necessary if genes provide more pragmatic measurements of diversity, distribution, and potential function. Rather than arbitrary sequence identity thresholds to determine binning into what counts as a different gene, the definition could be primarily focussed on enzymatic function, with sequence similarity considered using a large threshold value as only a broad preliminary filter. Portions of enzymatic pathways are often lost to single genomes due to genome streamlining (Giovannoni, 2017), but the function of the whole pathway will be retained, as genes lost to a single organism will be distributed across the community (Morris et al., 2012). Combined with the distribution of similar and identical genes through HGT, gene diversity and distribution within a community provide a much more comprehensive view of environmental function than species observations.
Prokaryotes do not fit the multicellular system for species identification and no alternative has been universally agreed upon. Morphological features are insufficiently diverse, unless only a small pool of possible organisms is considered for identification, as with medical microbiology. Genetic systems of classification require a standardized portion of the genome to be observed, a standardized method of similarity measurement, and a standardized threshold for where the division between species lies. Sequencing error and horizontal gene transfer further complicate genetic species definitions as genome sequence does not follow a single path of phylogenetic inheritance to a last common ancestor. HGT provides a process that allows environmentally important genes to persist on a much longer timescale than the survival of a single species. This transfer of important genes highlights the significance of functions of specific genes and the disconnect between function and any definition of species. Defining genes as the smallest unit of life provides increased taxonomic resolution while maintaining a direct relationship to observed biogeochemical functions.
Haya Abuzuluf
Jack Anthony
Judy Ban
Ryan Nah
Ryan Lou
Sawera Dhaliwal
Water samples from various depths of Saanich Inlet, a model ecosystem for the effects of growing oxygen minimum zones in the open ocean, were analyzed via 16S iTag amplicon sequencing and processed using both mothur and QIIME2 independently. The correlation between changes in abundance of Sulfurimonas, a genus encompassing several species of sulfur-oxidizing Epsilonproteobacteria, with oxygen, sulfide, and nitrate concentrations was determined by use of a linear regression model in R. Statistically significant associations with likely biological relevance were discovered using mothur but not for QIIME2. Statistically significant correlations between individual OTUs and ASVs with various nutrient concentrations were similarly discovered, with more identified by mothur than QIIME2. Sequence processing with mothur and QIIME2 arrived at very different conclusions, suggesting that the two fundamentally different analysis philosophies lead to very different results. However, the statistical and methodological weaknesses of this study do not enable strong claims for or against the use of either analysis pipeline. We discuss these downfalls and possible directions for future work in this area to build on our results.
Saanich Inlet is a seasonally anoxic fjord located off the southeastern coast of Vancouver Island, British Columbia [1]. During fall and winter, strong winds mix water from the Strait of Georgia into the inlet and replace bottom water [2]. As the weather gets calmer in spring, deep water in the inlet is retained by a shallow sill near the inlet mouth and increasing stratification. High levels of primary productivity export organic matter from the euphotic zone leading to oxygen loss with depth, caused by microbial remineralization of said organic matter. This organic matter fueled respiration is sufficient to create hypoxic conditions below 100m and anoxic conditions below 150m [1]. Saanich Inlet’s predictable, recurring anoxia presents an excellent model exosystem for the study of processes occurring in other anoxic marine environments all around the world.
Organic carbon is used by heterotrophic bacteria as an energy source in aerobic respiration. As oxygen availability drops, methane, ammonia, and hydrogen sulfide build up as the products of anoxic metabolism [1]. Globally, this process has the effect of creating bands of hypoxic water between 100m and 1000m in the open ocean. These oxygen minimum zones (OMZs) are expanding in all oceans and are expected to continue expansion as a consequence of anthropogenic climate change [3]. In Saanich Inlet, the seasonal flushing of new water into the deep removes the buildup of metabolites and refreshes the supply of oxygen each winter. The predictable and recurring nature of the physical and chemical cycling of Saanich Inlet makes it an ideal model ecosystem to study the biological processes that occur in oxygen minimum zones (OMZs) globally. The species present in a particular water sample are largely influenced by the availability of specific terminal electron acceptors (TEAs). Saanich Inlet offers an annually reset gradient of changing TEAs with depth, making it an excellent system to study the change of community structure in relation to depth or oxygen.
Operational taxonomic units (OTUs) are a representation of taxonomic grouping used for both metagenomic and 16S amplicon sequencing data, where genetic sequences which fall within some limit of similarity are clustered together into an OTU [4]. While this serves to overcome the problem of sequencing error causing the true sequence to be masked by grouping of the similar sequences, if the similarity cutoff is not sufficiently stringent, multiple species may be clustered together. Similarly, if it is too strict, a single species may be split up into multiple OTUs. Additionally, as OTUs represent a cluster of a number of distinct sequences generated for a specific data set, OTU abundances are not comparable across studies and datasets. Despite these issues, for the purposes of microbial community analysis, OTUs generated at the 3% sequence similarity cutoff have long been the de facto proxy for species identity.
Recently, an alternative to OTUs has been presented; namely, the amplicon sequence variant, or ASV [5]. ASVs are determined under the idea that the true sequence would appear more commonly than would an erroneous sequence, and thus only the sequences thought to be real sequences in this way are retained, disregarding all erroneous sequences. While this leads to discarding large amounts of sequence information, the resulting true sequences are comparable between data sets as a specific genetic sequence is associated with any given ASV. Though ASVs represent an actual sequence independent of a given dataset which is arguably superior to that of an OTU definition, whether their use confers advantages over OTUs has not been conclusively determined presently.
Sulfurimonas was selected as our taxon of interest due to its unusual metabolic pathways and the effects said pathways have on the environment. This bacterium can have either spiral or curved rod shaped cells and has one or two flagella for motility. These chemolithoautotrophs reduce nitrate and nitrite using sulfur compounds or H2 as electron sources [6]. Such a coupling of nitrogen and sulfur cycles in a single bacterium suggests a fascinating interplay between abundance and TEA availability. Sulfurimonas would be expected to increase in abundance with nitrate and oxidized sulfur concentrations, as well as decrease in abundance in the presence of oxygen, as it may be outcompeted by more organisms that find the surface environmental reduction potential more favourable. However, interactions with other species in the anoxic region, combined with limiting nitrate gradients near the bottom may cause the abundance to peak at intermediate depths, before declining in the deepest part of the water column.
The environmental samples used in this project were collected through time-series monitoring in Saanich Inlet on a monthly-basis aboard the MSV John Strickland at station S3 (48°35.500 N, 123°30.300 W) [1]. Samples for large volume (LV) SSU rRNA gene tags, metagenomics, metatranscriptomics, and metaproteomics were taken from six major depths spanning the oxycline (10, 100, 120, 135, 150, 165, and 200m). Large volume waters were collected in 2x12 1 Go-Flow bottles on a wire and gathered into 2L Nalgene bottles with sterile silicon tubing immediately following sampling for dissolved gases to minimise changes in microbial gene expression. A 0.22 μm Sterivex filter was used to collect biomass from collected water samples. Genomic DNA was then extracted from these filters and used to generate small subunit ribosomal RNA (SSU or 16S/18S rRNA) gene pyrotag libraries. PCR amplification targeting the V4-V5 region of SSU rRNA gene was performed to generate iTag datasets or amplicons. Samples were sequenced according to the standard operating protocol on an Illumina MiSeq platform at the JGI with 2x300bp technology. Using as consistent parameters, sequences were processed through both mothur [7] and QIIME2 [8]. Two phyloseq objects resulted from the processing which were then used in subsequent analyses.
Analysis was completed in R v3.4.3 [9] using the following packages.
library("tidyverse")
## -- Attaching packages -------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1 v purrr 0.2.4
## v tibble 1.4.2 v dplyr 0.7.4
## v tidyr 0.8.0 v stringr 1.3.0
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ----------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library("phyloseq")
library("vegan")
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-6
library("corrplot")
## Warning: package 'corrplot' was built under R version 3.4.4
## corrplot 0.84 loaded
Sample data from mothur and QIIME2 were organized and normalized using data.frame() function to make tabular data, where cases were represented in rows and measurements in columns. Relative abundances of our taxa of interest were compared across nutrient gradients using a linear regression model. Linear models describe a continuous response variable as a function of one or more predictor variable [10]. They were used in our analyses since each of the nutrient concentrations is a continuous variable with order.
Figure 1. Nutrient concentration plotted with depth
Overall microbial community structure at the phylum as determined by mothur in terms of taxonomic breakdown changes slightly from 10m to 100m, then is relatively consistent throughout all the lower depths (Fig. 2). Most depths are dominated by a large population of Proteobacteria and Bacteroidetes, the relative population of the latter giving way to the Proteobacteria with depth. Similar observations are seen in the QIIME2-processed data (Fig. 3).
Figure 2. Relative abundance of phyla present in Saanich Inlet based on sample depth as determined by mothur.
Figure 3. Relative abundance of phyla present in Saanich Inlet based on sample depth as determined by QIIME2.
In the mothur-processed data, Chao1 richness indicates that the number of identified OTUs increases until a peak at 100m, then decreasing to the deepest parts of the inlet, starting at 120m with a small increase at 200m (Fig. 4). Species diversity, as measured by the Inverse Simpson index, also peaks at 100m then decreases with increased depth. As oxygen concentrations decreases with depth in Saanich Inlet (Fig. 1), a higher oxygen concentration correlates with both increased richness and diversity in the sample.
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning: Removed 7 rows containing missing values (geom_errorbar).
Figure 4. Alpha Diversity Indicators - Chao1 and Inverse Simpson Index plotted against depth and coloured to compare with [O2] from mothur-processed data.
As for the QIIME2-processed data, Chao1 richness peaks at 120m where as before it decreases with increasing depth, increasing slightly at 200m (Fig. 5). Diversity, on the other hand, peaks at 10m, which then decreases monotonically with depth. Diversity and richness appear less strongly correlated in the QIIME2 dataset. However, the overall trends in richness and diversity of Saanich Inlet are observable regardless of analysis pipeline.
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning: Removed 7 rows containing missing values (geom_errorbar).
Figure 5. Alpha Diversity Indicators - Chao1 and Inverse Simpson Index plotted against depth and coloured to compare with [O2] from QIIME2-processed data.
In the mothur-processed data, correlation matrix analysis indicates that Sulfurimonas abundance appears to increase with depth, decrease with oxygen and nitrate concentration, as well as increase with sulfide concentration (Fig. 6). Indeed, linear regression of nutrient concentration and Sulfurimonas relative abundance indicate that the concentrations of each of the 3 nutrients significantly relate to Sulfurimonas relative abundance (Fig. 8; p < 0.01 after FDR-correction; Table 1). This overall pattern is observable in the QIIME2-processed data as well, while the correlations differ in strength (Fig. 7) and are not statistically significant when multiple linear regression is performed (Fig. 9; p > 0.05 after FDR-correction; Table 2).
Figure 6. Correlation matrix of nutrient concentration and Sulfurimonas abundance as determined with mothur.
Figure 7. Correlation matrix of nutrient concentration and Sulfurimonas abundance as determined with QIIME2.
| Estimate | Std..Error | t.value | p.value | FDR-corrected | |
|---|---|---|---|---|---|
| (Intercept) | 0.0224803 | 0.0022788 | 9.865069 | 0.0022148 | 0.0044296 |
| O2_uM | -0.0000980 | 0.0000156 | -6.286284 | 0.0081297 | 0.0081297 |
| H2S_uM | 0.0023563 | 0.0002058 | 11.449147 | 0.0014300 | 0.0044296 |
| NO3_uM | -0.0007633 | 0.0001212 | -6.300053 | 0.0080795 | 0.0081297 |
| Estimate | Std..Error | t.value | p.value | FDR-corrected | |
|---|---|---|---|---|---|
| (Intercept) | 0.0809162 | 0.0241215 | 3.354527 | 0.0439111 | 0.1756445 |
| O2_uM | -0.0003616 | 0.0001650 | -2.192073 | 0.1160289 | 0.2320579 |
| H2S_uM | 0.0029215 | 0.0021785 | 1.341047 | 0.2723990 | 0.2723990 |
| NO3_uM | -0.0020102 | 0.0012825 | -1.567472 | 0.2149981 | 0.2723990 |
Figure 8. Relative abundance of Sulfurimonas plotted against nutrient concentration from the mothur-processed dataset.
Figure 9. Relative abundance of Sulfurimonas plotted against nutrient concentration from the QIIME2-processed dataset.
Mothur identified eight OTUs classified as Sulfurimonas, all of which were only identified in the samples at depths below 100m (Fig. 10). As at the taxon level, the abundances of most, of these OTUs increase with depth, and all but one were present at the deepest depth. Similarly, QIIME2 identified seven Sulfurimonas ASVs (Fig. 11), meaning that the richness determined by the two analysis pipelines is roughly comparable. However, fewer ASVs were shown to increase in abundance with depth, and just under half were identified in the 200m sample.
Figure 10. Relative abundances of each Sulfurimonas OTU as determined by mothur, recolored by oxygen concentration at the given depth of the sample.
Figure 11. Relative abundances of each Sulfurimonas ASV as determined by QIIME2, recolored by oxygen concentration at the given depth of the sample.
Multiple linear regression of the mothur-processed data indicated that the abundances of 7 of the 8 identified OTUs significantly correlated following false discovery rate correction with one or more of the tested nutrients (NO3-, H2S, and O2), most commonly sulfide (Table 3). A representative OTU is shown in Figure 12, OTU0308, which showed a significant correlation with all 3 tested nutrient variables. In comparison, QIIME2-processed data showed no significant correlations with either oxygen nor nitrate concentration, and the abundances of only 3 of the 7 identified ASVs significantly correlated with sulfide concentration (Table 4). A representative ASV is shown in Figure 13, ASV1250, which showed a correlation of abundance with sulfide concentration.
| OTU | Variable | Estimate | Std_Error | t_value | p_value | FDR_corrected |
|---|---|---|---|---|---|---|
| Otu0308 | O2 | -0.0000601 | 0.0000080 | -7.4950313 | 0.0049204 | 0.0168698 |
| Otu0308 | H2S | 0.0013338 | 0.0001058 | 12.6015568 | 0.0010776 | 0.0113253 |
| Otu0308 | NO3 | -0.0004953 | 0.0000623 | -7.9496787 | 0.0041516 | 0.0166066 |
| Otu0666 | O2 | -0.0000203 | 0.0000046 | -4.4026333 | 0.0217284 | 0.0474074 |
| Otu0666 | H2S | 0.0000584 | 0.0000609 | 0.9589865 | 0.4083113 | 0.4666414 |
| Otu0666 | NO3 | -0.0001611 | 0.0000358 | -4.4974784 | 0.0205213 | 0.0474074 |
| Otu0704 | O2 | 0.0000061 | 0.0000063 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu0704 | H2S | 0.0008052 | 0.0000834 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu0704 | NO3 | 0.0000623 | 0.0000491 | 1.2682777 | 0.2941783 | 0.4412674 |
| Otu0751 | O2 | -0.0000269 | 0.0000055 | -4.8516438 | 0.0167137 | 0.0445698 |
| Otu0751 | H2S | -0.0002618 | 0.0000731 | -3.5797332 | 0.0372935 | 0.0745869 |
| Otu0751 | NO3 | -0.0002206 | 0.0000430 | -5.1242045 | 0.0143892 | 0.0431675 |
| Otu1315 | O2 | 0.0000022 | 0.0000023 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu1315 | H2S | 0.0002899 | 0.0000300 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu1315 | NO3 | 0.0000224 | 0.0000177 | 1.2682777 | 0.2941783 | 0.4412674 |
| Otu2793 | O2 | 0.0000005 | 0.0000005 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu2793 | H2S | 0.0000644 | 0.0000067 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu2793 | NO3 | 0.0000050 | 0.0000039 | 1.2682777 | 0.2941783 | 0.4412674 |
| Otu3512 | O2 | 0.0000000 | 0.0000033 | 0.0119645 | 0.9912051 | 0.9912051 |
| Otu3512 | H2S | 0.0000020 | 0.0000439 | 0.0449804 | 0.9669496 | 0.9912051 |
| Otu3512 | NO3 | 0.0000191 | 0.0000258 | 0.7396834 | 0.5131159 | 0.5597628 |
| Otu3610 | O2 | 0.0000005 | 0.0000005 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu3610 | H2S | 0.0000644 | 0.0000067 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu3610 | NO3 | 0.0000050 | 0.0000039 | 1.2682777 | 0.2941783 | 0.4412674 |
Figure 12. Relative abundance of OTU0308 versus nutrient concentration, points recolored by depth.
| ASV | Variable | Estimate | Std_Error | t_value | p_value | FDR_corrected |
|---|---|---|---|---|---|---|
| Asv277 | O2 | -0.0003010 | 0.0001398 | -2.1533547 | 0.1203267 | 0.5053720 |
| Asv277 | H2S | -0.0040259 | 0.0018458 | -2.1810766 | 0.1172306 | 0.5053720 |
| Asv277 | NO3 | -0.0019546 | 0.0010866 | -1.7987740 | 0.1698884 | 0.5351818 |
| Asv561 | O2 | 0.0000006 | 0.0000519 | 0.0119645 | 0.9912051 | 0.9912051 |
| Asv561 | H2S | 0.0000308 | 0.0006857 | 0.0449804 | 0.9669496 | 0.9912051 |
| Asv561 | NO3 | 0.0002986 | 0.0004037 | 0.7396834 | 0.5131159 | 0.5671281 |
| Asv578 | O2 | 0.0000150 | 0.0000156 | 0.9602731 | 0.4077575 | 0.5351818 |
| Asv578 | H2S | 0.0019946 | 0.0002066 | 9.6541059 | 0.0023594 | 0.0165161 |
| Asv578 | NO3 | 0.0001543 | 0.0001216 | 1.2682777 | 0.2941783 | 0.5351818 |
| Asv1153 | O2 | 0.0000333 | 0.0000347 | 0.9602731 | 0.4077575 | 0.5351818 |
| Asv1153 | H2S | 0.0044271 | 0.0004586 | 9.6541059 | 0.0023594 | 0.0165161 |
| Asv1153 | NO3 | 0.0003424 | 0.0002700 | 1.2682777 | 0.2941783 | 0.5351818 |
| Asv1250 | O2 | 0.0000154 | 0.0000160 | 0.9602731 | 0.4077575 | 0.5351818 |
| Asv1250 | H2S | 0.0020433 | 0.0002116 | 9.6541059 | 0.0023594 | 0.0165161 |
| Asv1250 | NO3 | 0.0001580 | 0.0001246 | 1.2682777 | 0.2941783 | 0.5351818 |
| Asv1620 | O2 | -0.0000548 | 0.0000487 | -1.1252336 | 0.3423877 | 0.5351818 |
| Asv1620 | H2S | -0.0007710 | 0.0006433 | -1.1986359 | 0.3167203 | 0.5351818 |
| Asv1620 | NO3 | -0.0002880 | 0.0003787 | -0.7606489 | 0.5021883 | 0.5671281 |
| Asv2216 | O2 | -0.0000702 | 0.0000731 | -0.9602731 | 0.4077575 | 0.5351818 |
| Asv2216 | H2S | -0.0007774 | 0.0009655 | -0.8051467 | 0.4796349 | 0.5671281 |
| Asv2216 | NO3 | -0.0007209 | 0.0005684 | -1.2682777 | 0.2941783 | 0.5351818 |
Figure 13. Relative abundance of ASV1250 versus nutrient concentration, points recolored by depth.
Thus, while the overall trends in abundance across depth, [H2S], [NO3-], and [O2] appear to be similar for mothur- and QIIME2-processed data, many more relationships were found to be statistically significant in the mothur-processed dataset, both at the relative genus level abundances as well as at the OTU/ASV level. The abundances appear to match the fact that the Sulfurimonas genus tends to inhabit anoxic/sulfidogenic regions of marine basins, including oxic-anoxic interfaces and hydrothermal vents [13].
Within the Sulfurimonas genus the richness determined by mothur and QIIME2 processed data was highly comparable. Both mothur and QIIME2 pipeline processed data appeared to reveal the same overall trends of community structure change with respect to nutrient concentration, though the proportion of differences which were statistically significant was far lower in the QIIME2 data. In general, diversity relative to richness was found to rise to a peak, after which it begins to decrease with increasing depth (Figures 4-5). Sulfurimonas abundance increases with increasing depth and sulfide concentration, and decreases with increasing oxygen and nitrate concentration (Figures 6-9).
Most of the trends observed in this study were consistent with our hypotheses. As a chemolithoautotroph, it is expected that Sulfurimonas be most abundant in deep, anoxic waters. Sulfurimonas metabolizes by reducing nitrate and nitrite and oxidizing sulfur containing compounds or hydrogen [5]. Thus, as hypothesized, Sulfurimonas relative abundance was significantly decreased in the presence of oxygen, where it will likely be outcompeted by organisms who can use oxygen as an electron donor. In the same way, Sulfurimonas relative abundance was significantly increased in the presence of sulfide. However, it was unexpected that Sulfurimonas abundance was significantly negatively associated with nitrate concentration. It may be that organisms which utilize nitrate more efficiently are outcompeting Sulfurimonas as at the depths where nitrate is high, the concentration of sulfide is low (Fig. 1). Potentially this association may be caused by other confounding environmental factors that were studied in this analysis such as temperature or microbial metabolites produced by specific microbes adapted to those particular anoxic regions of the water column. Regardless, this observation was contrary to what was expected, and no single explanation can be concluded based on the available data. Evidently, however, the abundance of Sulfurimonas in Saanich Inlet is likely to vary as a function of many nutrient concentrations, not just oxygen.
There appears to be a correlation between stratified layers of the water column and Sulfurimonas distribution. Samples used in this study were taken at a time of year when stratification due to a thermocline was present around 10m-100m. The temperature at 10m is substantially higher than lower depths. An increase in diversity was also observed from the surface to a peak at 10m. It could be hypothesized that the high diversity is due to ambient temperatures being ideal for bacterial growth with minimal limitation. In this study, it was also found that diversity tends to decrease with richness at lower depths, with minimums coinciding with the boundary of the thermocline. The low diversity at depth could also be explained by the high specificity required to thrive in such extreme environments. For instance, our taxon of interest, Sulfurimonas, is commonly found near deep sea hydrothermal vents and functions primarily by sulfide oxidation, a process which is favored by the high concentrations of sulfide in that habitat.
The overall trend of the results of this experiment align with our hypotheses and previous literature [6], and while the conducted statistical tests show that some of the correlations we expected to see are significant, a number of methodological issues make us less confident in the replicability of our results. The use of a linear regression model assumes that the relationships between the tested variables would be linear, which is highly unlikely to be true in a complex biological system. As an example, interspecies competition may make intermediary concentrations of a nutrient such as nitrate more beneficial, as at higher concentrations perhaps denitrifying bacteria would outcompete Sulfurimonas, and it is only at intermediate concentrations that it can find its niche. However, should the concentration be too low, Sulfurimonas would be unable to grow either. Thus, given that there will always be other confounding environmental variables not taken into account in the model, the apparent success of the model is more surprising than not. Future investigations into the subject should consider employing stronger statistical tests which more accurately measure the phenomenon being examined.
We are unable to draw any definitive conclusions regarding which pipeline used in this study is strictly superior. However, clear differences were observed in the results obtained from each, specifically in regards to the Sulfurimonas genus. Both suggested similar overall relationships between nutrient concentration and Sulfurimonas abundance. However, whether the identified relationships were statistically significant proved to be very different between the two analysis pipelines. Due to the aforementioned flaws in the statistical methodology used, we cannot confidently assert whether use of one pipeline is better than the other, but it is clear that the two produce different results, and choice of pipeline is extremely important for microbial ecology. It should be noted that had multiple-comparisons testing not been controlled for, far more test conditions would have yielded significance. This highlights the importance of ensuring statistical rigour of the experimental methodology, as well as the benefit of consulting a statistician during the experimental design process.
As each pipeline approaches the most important analytical step of sequences processing differently, each has its own guidelines and configuration profiles. Hence, choosing the correct pipeline with a set of parameters and algorithms for a given application is important. While for this study the parameters and databases were kept as similar as possible between the two, there are fundamental differences in the philosophy by which the two pipelines handle data. As mothur clusters individual sequences while QIIME2 groups by sample consistency, studies have shown that the effect of sequencing errors yielded a bigger impact on the results than choosing the appropriate gene region for amplification [11]. In addition, selecting a pipeline with higher sequence throughput could increase the chance of richness overestimation [11]. Because the analytical steps are paramount to microbial ecology research and discovery, variations in the quality of databases and their annotations could impact the validity of research results. Especially for clustering-first pipelines such as mothur and QIIME2, the choice of the reference database in terms of comprehensiveness and sensitivity has implications in the accuracy of microbial abundance estimation [12]. A standardized evaluation protocol may be beneficial to overcome the dilemma of pipeline selection. Regardless of the pipeline chosen, it is important going forward to ensure pipeline methodology is described completely, with every decision within the pipeline well-justified.
To build on the results of this study, future work analysing the role of each of the many environmental variables on abundance of Sulfurimonas by a stronger multivariate statistical test would be beneficial. Additionally, further analyses conducting processing with mothur and QIIME2 in parallel would enable stronger claims to be made regarding the relative ability of the two analysis pipelines, and more generally for OTU- and ASV-based analysis pipelines.
Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Finke J. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data. 4:170159. doi:10.1038/sdata.2017.159.
Ocean Networks Canada. 2013. Introduction to Saanich Inlet. Retrieved from http://www.oceannetworks.ca/introduction-saanich-inlet.
Stramma L, Schmidtko S, Levin LA, Johnson GC. 2010. Ocean oxygen minima expansions and their biological impacts. Deep Sea Res Part 1 Oceanogr Res Pap. 57(4):587-595. doi: 10.1016/j.dsr.2010.01.005
Wooley, J. C., Godzik, A., & Friedberg, I. (2010). A primer on metagenomics. PLoS Computational Biology, 6(2), e1000667. doi:10.1371/journal.pcbi.1000667
Callahan, B. J., Mcmurdie, P. J., & Holmes, S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME Journal, 11(12), 2639. doi:10.1038/ismej.2017.119
Labrenz M, Grote J, Mammitzsch K, Boschker HTS, Laue M, Jost G, Glaubitz S, Jürgens K. 2013. Sulfurimonas gotlandica sp. nov., a chemoautotrophic and psychrotolerant epsilonproteobacterium isolated from a pelagic redoxcline, and an emended description of the genus Sulfurimonas. Int J Syst Evol Microbiol. 63(Pt 11):4141-4148. doi:10.1099/ijs.0.048827-0.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. 2009. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537-7541. doi:10.1128/AEM.01541-09
Ortutay C, Ortutay Z. 2017. Introduction to R statistical environment, p 1-15. In Molecular Data Analysis, John Wiley & Sons, Hoboken, New Jersey, USA.
Bingham NH, Fry JM, & SpringerLink ebooks - Mathematics and Statistics. (2010). Regression: Linear models in statistics. London; New York: Springer.
Siegwald L, Touzet H, Lemoine Y, Hot D, Audebert C, Caboche S. 2017. Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics. PLoS One. 12(1):e0169563. doi:10.1371/journal.pone.0169563. Plummer E, Twin J, Bulach DM, Garland SM, Tabrizi SN. 2015. A Comparison of Three Bioinformatics Pipelines for the Analysis of Preterm Gut Microbiota using 16S rRNA Gene Sequencing Data. J Proteomics Bioinform. 8:283-291. Sievert SM, Scott KM, Klotz MG, Chain PSG, Hauser LJ, Hemp J, Hügler M, Land M, Lapidus A, Larimer FW, Lucas S, Malfatti SA, Meyer F, Paulsen IT, Ren Q, Simon K, the USF Genomics Class. 2008. Genome of the Epsilonproteobacterial Chemolithoautotroph Sulfurimonas denitrificans. Appl Environ Microbiol. 74(4):1145-1156. doi:10.1128/AEM.01844-07.
Welch et al. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. PNAS. 99(26). [DOI: 10.1073/pnas.252529799] (https://www.ncbi.nlm.nih.gov/pubmed/12471157)
Kim et al. 2014. Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol. 64(Pt 2):346-51. DOI: 10.1099/ijs.0.059774-0
Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science. 320(1034). DOI:10.1126/science.1153213
Giovannoni SJ. 2017. SAR11 bacteria: The most abundant plankton in the oceans. Ann Rev Mar Sci. 9(1):231-255. DOI: 10.1146/annurev-marine-010814-015934
Martinez et al. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proc Natl Acad Sci USA. 104(13):5590-5595. DOI:10.1073/pnas.0611470104
Wooley, Godzik, Friedberg. 2010. A primer on metagenomics. Comput Biol. 6(2):e1000667. DOI:10.1371/journal.pcbi.1000667
Kunin et al. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol. 12(1):118-23. DOI: 10.1111/j.1462-2920.2009.02051
Morris, Lenski, Zinser. 2012. The black queen hypothesis: evolution of dependencies through adaptive gene loss. 3(2):e00036-12. DOI:10.1128/mBio.00036-12